Clustering millions of tandem mass spectra.
نویسندگان
چکیده
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.
منابع مشابه
Spectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications.
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides-typical observations include peptides differing by one or t...
متن کاملPeptide identification by database search of mixture tandem mass spectra.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions...
متن کاملValidation of endogenous peptide identifications using a database of tandem mass spectra.
The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment...
متن کاملProtein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patterns
Tandem mass spectrometry is widely used in proteomic studies because of its ability to identify large numbers of peptides from complex mixtures. In a typical LCMS/MS experiment, thousands of tandem mass spectra will be collected and peptide identification algorithms are of great importance to translate them into peptide sequences. Though these spectra contain both m/z and intensity values, most...
متن کاملPreprocessing of tandem mass spectra using machine learning methods
Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole ami...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of proteome research
دوره 7 1 شماره
صفحات -
تاریخ انتشار 2008